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Abstract. We combine geometric data analysis and stochastic modeling to describe 
the collective dynamics of complex systems. As an example we apply this approach to 
financial data and focus on the non-stationarity of the market correlation structure. 
We identify the dominating variable and extract its explicit stochastic model. This 
allows us to establish a connection between its time evolution and known historical 
events on the market. We discuss the dynamics, the stability and the hierarchy of the 
recently proposed quasi-stationary market states. 
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1. Introduction 

Big data is the buzzword of recent years, reflecting an ever increasing amount of 
electronically available data that demands analysis and interpretation. Our focus is 
on complex dynamical systems such as financial markets, where huge data sets exist 
in the form of multivariate time series. The dynamical behavior of such systems may 
reduce their complexity by self-organization [I]. System variables, which are measured 
as single time series, couple together to a few dominating variables, which accurately 
describe the system dynamics and allow for predictions. The self-organization may 
produce patterns in observed data which are generally difficult to uncover. A wide range 
of data analysis techniques is available and widely used, including graph theoretical 
information filtering [2, EH, 4] El IS, 7], data clustering [S[ |0, HU [TT, D-13 an d geometric 
approaches [HI [15], HE, U, HE] • All these techniques are based on a similarity measure 
between the data points. There is a major disadvantage in this approach: The time 
information of the measured data is neglected. Thus, the system dynamics is not 
explicitely taken into account. On the other hand, dynamical variables of complex 
systems have been successfully described by stochastic processes [HI TJ 2D]- In this 
description the variables evolve in time according to deterministic dynamics, which 
gives access to system stability and fixed points and is exposed to generally non-trivial 
stochastic fluctuations. Here, we combine the data set analysis with stochastic methods 
in order to capture the full dynamics of the system. We apply our approach to stock 
market data. Similar techniques have proven successful in the description of complex 
dynamical systems [2U E2][23]. The paper is organized as follows: We present the data 
set and perform a geometric data analysis to uncover the dominating variable in Sec. [2] 
In Section we identify the quasi-stationary states of the financial market following 
Ref. [10]. We draw connections to known historical events. We present the stochastic 
analysis in Sec. [4] and discuss our results in Sec. [o] 


2. Analyzed Data 


In Sec. |2.1| we introduce our data set and the analyzed quantities. 


geometric analysis of the data in Sec. 2.2 


We perform a 


2.1. Observed Quantities 


We analyze daily adjusted closing stock prices Si(t) i = 1,..., K of the K companies in 
the S&P500 Index over the period of 21 years ranging from early 1992 to the end of 
2012. The data is freely available at finance.yahoo.com. To measure the correlations, 
we use the daily returns 


nit) 


Si(t + 1) ~Si(t) 
Si(t) 


( 1 ) 


and normalize them locally [21], to smooth out trends on very short times. We measure 
the time t in trading days. We then calculate the K x K correlation matrices C{t) by 
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averaging over a time window of T = 42 which is moved in one-day steps through the 
data. The elements of (7(f) are the Pearson correlation coefficients 




(TiTj) T (t) (Ti) T (t) (t j) T ( t ) 


a- 




( 2 ) 


Here a'f'Ht) = y(rf) T (t) — ( ri) 2 T (t ) is the time-dependent volatility and the sample 
average 

(/)tM = f E /(») (3) 

s=t-T +1 

of a quantity /(f) is evaluated over the T data points before f. We note that in contrast to 
the stock prices S) and price returns r*, the correlation coefficients C t j (f) are bounded 
quantities. All together we obtain iV = 5169 correlation matrices. The correlation 
matrices calculated on the short intervals T are noisy. We reduce the noise by averaging 
over the correlation coefficients which yields the mean correlation coefficient 


m = (. c(t )>... ( 4 ) 

Here - denotes the average over all d = (K 2 — K )/2 = 46971 independent correlation 
coefficients of every correlation matrix (7(f). 

We recall the spectral decomposition 


T 

C(t) = ^2\ a (t)u a (t)ul{t) 

a= 1 


Ai(f) 


-ui(f)-u|(f) + 


a=2 



(5) 


of the K x K correlation matrix C(f) [T71 fl8] . Here A a (f) denotes the ath eigenvalue of 
(7(f), M a (f) the corresponding normalized eigenvector and ul(t) its transpose. The rank 
of (7(f) is T and therefore only the first T eigenvalues are non-zero. For our data the first 
and the largest eigenvalue Ai(f) = A max (f) is sufficiency larger than the other eigenvalues. 
All components of Ui(t) are approximately equal to 0.05, while the components of the 
other T— 1 eigenvectors spread around zero for every time f. Therefore U\{t) corresponds 
to the dynamics of the whole market as in Refs. mm- Hence averaging over the 
correlation coefficients 

c(f) = (C(f)) tf « kA max (f) (6) 

we recover the largest eigenvalue. Here 


« = (ui(f)4(<))ij « 229 


(7) 


is an empirical factor which appears due to the noise in the data. The time evolution of 
the largest eigenvalue is strongly correlated with the mean correlation coefficient c(f), 
the Pearson correlation is 0.998. The quantities A max (f) and c(f) share therefore the 
same dynamics. We will show in Sec. 2.2 that c(f) has as much variability in the values 
as possible for our data. Figure [I] (a) shows the time evolution of c(f). We also present 
the time evolution of the S&P500 Index in Fig. 0(b). 
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Mean Correlation Coefficient 



S&P500 Index 



Figure 1 . (a) Time evolution of the mean correlation coefficient c(t). (b) The 

S&P500 Index for the same time period. Dashed lines highlight economically distinct 
time intervals as described in Section 15721 


2.2. Geometric Approach: Principal Component Analysis 
We identify each correlation matrix C(t) with a correlation vector 


c{t) 


ci(t) 

C 2 (t) 
Cd(t ) 


( 8 ) 


in the real d-dimensional Euclidian space M d . Here c l (t.) is the ftli component of c(t). We 
then apply the principal component analysis (Pearson [TS], Hotelling [IT]) to quantify 
orthogonal and therefore uncorrelated one-dimensional subspaces in our time series Cj(f), 
i — 1, d. 

The first principal component is defined as the line in with the largest possible 
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variance of the data values. The other principal components are those with the largest 
data variance and orthogonal to the preceding components. The number of the principal 
components is smaller or equal to d. The principal components are spanned by the 
orthogonal eigenvectors Dj, i — 1 ,d of the symmetric d x d covariance matrix 

W = AA f . (9) 

Here A is the d x T data matrix with d empirical times series q(£) as rows and A^ 
denotes its transpose. 

The rank of W is min(rf, T) and we can not apply the PCA to our full data so we 
applied the principal component analysis (PCA) to randomly chosen 100 stocks ending 
up with d = (100 2 — 100)/2 = 4950 time series of length T = 5169. Fig. [2] (a) shows the 
eigen vector components distribution for the first ten principal components. 


Principal Components Analysis 



Principal Components Variance 



Eigen Vector Components 


Number of Principal Component 


Figure 2. (a) The distribution of the first ten normalized principal components and 
(b) their variances normalized to the largest value. 


The components of the first normalized eigen vector are concentrated around a 
constant value 0.014, while the values of the other nine are symmetrically distributed 
around zero. Therefore the direction with the largest variance in data values is the 
subspace spanned by the vector 


Vl = 


a fd 


1 " 


’ 0.014 " 

1 


0.014 

1 


0.014 


( 10 ) 


The variance of data values for the first ten principal components are shown in Fig. 
[2] (b). The variance of the first principal component is much larger than the others. 
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Figure 3. (Color online) Data set projected onto the first three pricipal components. 
Different colors highlight different market states as explained in Sec. [5] 


The correlation matrices C(t) from our data set seen as vectors c(t) € are thus 
distributed along hi. Figure [3] shows the projection of our data onto the first three 
principal components in a scatter plot. The distribution of the data points along the 
first principal component is dominating. The contribution of the correlation matrix C(t) 
to the first principal component at time t is given by the scalar product 

1 ' 

(c(t),hi) = = c(t)Vd, ( 11 ) 

V u %= 1 


and turns out to be the mean correlation coefficient Q times the fixed number \fd. The 
dynamics of the market is therefore dominated by the movement along hi which is given 
by c(t). Eq. confirms the spectral analysis results discussed in Sec. | 2 . 1 [ We note 
that spectral analysis of the correlation matrix C(t) is the principal component analysis 
of the standardized returns 

r i(t) ~ { r i) T (*) 


fi(t) = 


a, 


(T) 


(*) 


( 12 ) 


treated as element of M A h Therefore the projection of (12) on the first principal 
component in M. K at time t is equal to the non-weighted average of fi(t). 

The projections (c(t),v 2 ) and {c(t),v 3 ) describe system dynamics along the second 
and third principal component and are shown in Fig. |4} 


3. Market States: Distinct Periods of the Market 


We cluster the data following Ref. [10] and identify the quasi-stationary states of the 
financial market which we present in Sec. 3.1[ We connect the characteristic states on 
the market to the known historical events in Sec. 13.21 
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Principal Component Analysis 



Figure 4. (Color online) Time evolution of the projections (c(t),v 2 ) (solid, black) and 
(c(t),v 3 ) (dashed, red) onto the second and third principal components normalized by 

Vd. 


3.1. Market States 


Market States 
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Figure 5. Time evolution of the market states. Dashed lines highlight economically 
distinct time intervals as described in Section [3.21 


In the previous section we showed that our data is spread along a few dominating 
subspaces in M. d . To quantify the similarity between any two correlation matrices C(t a ) 
and C(t b ) we calculate the distance 

Dab =|| C(t a ) - C(t b ) || = || c(t a ) - c(t b ) || (13) 

via the Euclidean norm on W d normalized by \fd. 













State 8 (0.58) 


State 7 (0.37) 

State 6 ( 0 . 26 ) 

1 State 5 ( 0 . 21 ) 

State 4 ( 0 . 2 ) 

—• State 1 (0.15) 

State 3 (o.i) 

State 2 (0.09) 


Figure 6. Clustering tree of the market states clustering. The mean value of c(t) 
within the states is given in parentheses. 

As the next step we use the bisecting Ac-means clustering algorithm P2j. At the 
beginning of the clustering procedure all of the correlation matrices are considered as 
one cluster, which is then divided into two sub clusters using the k -means algorithm 
with k — 2. For each cluster a we then calculate its cluster center 

( 14 ) 

a tea 

which is the mean correlation matrix in this cluster. Here N a denotes the number of 
the cluster elements and i6« symbolically denotes all time t for which c(t) is in the 
cluster a. The separation procedure is repeated until the cluster size 

R ol = II II ( 15 ) 

“ tea 

is smaller than a given threshold for every cluster a. We choose the mean distance to 
be smaller than 0.164 to achieve 8 clusters as in Ref. [TO] , The market is said to be 
in a market state a at time t, if the corresponding correlation matrix C(t), and hence 
the correlation vector c(t), is in the cluster a. The time evolution of the market states 
is shown in Fig. [5j In Figure [6] the corresponding clustering tree is shown. The state 
occupied on the first day of our data is labeled by one. The remaining states are labeled 
according to the mean value of c(t) within the states as shown in Fig. [6} We group the 
states into three main classes. The market states one, two and three represent calm 
states. The states four, five and six are intermediate states. The states seven and eight 
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are the turbulent states. The financial market evolves between these different states. 

New states form and existing states vanish in the course of time. For example the first 

four years are dominated by the states 1 and 2 in the last four years mainly the states 

8 and 7 are occupied. 

3.2. Distinct Time Periods 

We divide the entire time period into six dynamically and economically distinct intervals. 

(i) Early 1992 to spring of 1996: in this rather calm period c(t) varies between 0 and 
0.2. The S&P500 Index continuously grows with moderate volatility. The market 
mainly occupies the first and the second state. 

(ii) From spring 1996 until spring 2000: the range that c(t ) explores as well as the 
S&P500 Index drastically increase. The volatility also becomes larger. The increase 
of c(t) is explained by the appearance of strongly correlated industrial sectors 
during this period, especially the technology sector. The market state two almost 
disappears and the market jumps mainly between states five and one. We note that 
the fifth state appears only during this period. 

(iii) Spring 2000 to the second half of 2003: this period fully covers the dot-com bubble 
and is known as a very turbulent time in financial markets due to the crisis. 
The S&P500 Index drops continuously, losing about half of its value. The mean 
correlation coefficient reaches its maximum at 0.48. At the beginning of the crises 
state 3 appears for about one year. This state appeared only once during the entire 
time period. In the second half the market is switching between states four and 
six and occupies state seven by the end of 2002. This period includes the market 
response to the 9/11 attacks. 

(iv) From the second half of 2003 until fall of 2007: this period covers the four years 
period before the recent global financial crises up to the 1 year period before the 
collapse of Lehman Brothers. As seen from the S&P500 Index in Fig. [lj the market 
seems to recover after the dot-com crisis but c{t) does not calm down and strongly 
fluctuates around a mean value 0.28. The market is jumping between states four, 
six and seven. State six is occupied mainly during this interval. 

(v) From October 2007 until March 2009: this period covers the late-2000s financial 
crisis. The S&P500 Index drops continuously and looses approximately half of its 
value. The mean correlation coefficient is peaked sharply at 0.67. The market is 
mainly in state seven and occupies the eighth state by the end of 2008. 

(vi) March 2009 to end 2012: the market seems to slowly recover as the S&P500 Index 
grows again. The growth interrupted by drastic drops. This is reflected in high 
peaks of c(t), which accounts its maximum value 0.77 in the analyzed 21 years. The 
mean correlation coefficient does not relax to the values it had before the crisis. The 
market is switching between states seven and eight and decays for short time into 
the states four and six. 
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4. Stochastic Analysis 


We describe the stochastic process used to model c(t) in Sec. 4.1 In Sec. 4.2 we explain 
how the explicit model is extracted form the time series. We describe the stochastic 


analysis of the market states in Sec. 5.4 


4-1- Stochastic Processes 

We model c(t) as a stochastic process described by a Langevin equation 


—c(() = f(c,t)+g(c,t)T(t), 


(16) 


i.e. a stochastic differential equation (SDE) for the variable c(t) € M. Here / is the 
deterministic part of (16) - the drift function and g is the diffusion function, which 


defines the stochastic part of (16). T(£) is the 5-correlated Gaussian white noise with 
<r(t)) = 0 and (r(n)r(t 2 )) = S(ti — t 2 ). We note that for the dimensionless variable 
c(t) the drift function has a dimension of inverse time and the diffusion function has a 
dimension of inverse square root of time. 


The solution of (16) is defined in terms of stochastic integrals, which depend on 


the choice of the discretization [251 [261127]. Throughout this paper we use Ito’s choice 
(see Ito’s interpretation of SDEs [25] [28]). The advantage of Ito’s definition is that the 
diffusion term g is uncorrelated with the Gaussian white noise (g(c,t)F(t)) = 0 
The drift and diffusion terms can therefore be obtained as conditional moments 


J V ’ T—>0 T 


g 2 (c, t) = lim 

T— >-0 


c(t)=c 

((c(t+ r) -c(t)) 2 ) 


r 


(17) 

(18) 


c(t)=c 


Here c denotes the value of the stochastic variable c(t) at which the value of the drift 
or the diffusion is evaluated. At this one instant we distinguish between c(t) and a 
particular numerical value c. The average in Eqs. 0 and ( |I8| is performed over all 
realizations of c(t) for which the condition c(t) = c holds. These equations express 
therefore the time derivative of the mean displacement and its square of c(t) at c. 

Expressions ( |I7| ) and (18) allow one to estimate the drift and diffusion directly from 
the empirical data as shown in Refs. [30, IS] and sketched below, see Ref. mmmm 
for applications. In the present work we model c(t) by an Ito stochastic process and 
estimate the deterministic as well as the stochastic part of the corresponding SDE from 
the empirical time series. 


4-2. Estimation of the Conditional Moments 

For the estimation of the drift and the diffusion directly from the data set we mainly 
follow Refs. [30] 2Ds m 32J- Here, we briefly sketch the estimation procedure for the 
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drift function, i.e. the first conditional moment © , as the estimation of the diffusion 


function (18) works accordingly. We first introduce a new function 


MJr) = 


{c(t + t) - c(t)) 


c(t)=c 


for which the drift function 


/(c,t) = \unM c (r) 

r —>0 


(19) 


( 20 ) 


is obtained at r = 0. We note that we dropped the time variable t in the argument of 


M in Eq. (19) for brevity. For the estimation of M c (t) at fixed c as a function of r we 


divide the time series c(t) into bins with equal number of data points. For every bin / 


the function MJr) is then estimated as 




(c(t + r) - c(t)) 


T 


( 21 ) 


c(t)e/ 


Here cj is the mean value of c(t) in bin / and the average is performed over all data 
in this bin. We note that for the empirical data this estimation can only be done for 
discrete values of r = 1,2,3.... We then fit a second order polynomial in r to the 


empirically estimated values of (21), extracting the desired value of the drift at cj as 


the constant coefficient of the fitted function. The estimation of (19) is only possible 
for the realized values of the empirical times series c(t). 


Instead of analyzing the drift function (17) itself, it is more convenient to consider 
the potential function 


Vic.l) - - / f(x,t)dx, 


( 22 ) 


defined as the negative primitive integral of /. The minus sign is a convention. The 
dynamics of the system is encoded in the shape of V(c,t ): the local minima of the 
potential function correspond to the quasi-stable equilibria, or quasi-stable fixed points, 
around which the system oscillates. In contrast, local maxima correspond to unstable 
fixed points. We note that potential functions are defined up to an additive constant. 
For the dimensionless variable c(t) the dimension of the potential function is the inverse 
time. 


4-3. Market States Dynamics 

To quantify the market dynamics while it is in a fixed market state a we restrict the 


estimation of (21) and evaluate only the data points 

{c(t), c(t + r) 1 1 e a} 


(23) 


for each state a. We therefore consider only displacements along the first principal 
component within the market states. No state transitions are allowed. Potential 
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functions estimated this way provide information about the stability of the market 
states and reveal the fixed points. 


As we mentioned in Sec. 3T we group the states into the three main classes 
according to the hierarchical structure as shown in Fig. [6j We estimate the potential 
functions for each class A evaluating only the data points 


(c(t), eft + t) \t e A}. 


(24) 


Here t e A symbolically denotes all time points at which market is in a state of the 
class A. For example the market might be in the state 1 at time t and in the state 2 at 
time t + t, as these two clusters belong to the same class. We therefore consider only 
displacements within A and allow for state transitions between state of the same class. 


5. Results 


We show the estimated diffusion function (18) in Sec. 5.1 and discuss the estimated 


potential function (22) in Sec. 5.2 In Section 5.3 we take a closer look at the dot-com 


bubble. A detailed study of the market states dynamics is presented in Sec. 5.4 


5.1. Diffusion Term 

To quantify the time dependency of the diffusion function g(c, t ) we estimated the second 


conditional moment (18) on a time window of four trading years (1008 trading days) 


which is moved in steps of two trading months (42 trading days). All together we obtain 
100 estimates for g(c,t ) which we present in Fig. [Tj As we explained in Sec. 4.2, the 
estimation is only possible for the realized values of c(t). We therefore put all estimated 
values in a single diagram. We then fit the estimated values by the time-independent 
function 

g{c) Ay/ (c C m in)(c m ax c), (25) 

which fits our data well, see Fig. [7j The diffusion function (25) is widely used to model 
the stochastic correlation [33] AT. 35], 06] (37], as it limits the values of the correlation to 
the range [c min , c max ]. From the estimated parameters 

A = 0.0245 td“ 1/2 , (26) 

c min = 0.042, (27) 

c max = 0.918, (28) 

we obtain the characteristic time scale of the system 


to = Tx = 1666 trading days, 
A 2 


(29) 


which turns out to be approximately one third of the analyzed period. For consistency 
we estimate (18) for the entire time series c(t) at once, as shown in Fig. [7j We note that 


we fitted (25) only to the data obtained on the sliding window. 
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The Diffusion Function 



Figure 7. The diffusion function estimated on the sliding window (crosses (+)). The 
circles (o) show the estimated diffusion function on the entire time period at once. The 


solid curve shows the fitted function (251. We only use values estimated on the sliding 
windows for the fit. 


5.2. Time Evolution of the Potential Functions in the Entire Time Period 

To quantify the time dependence of the drift function f(c,t ) we estimate the first 
conditional moment on a time window of four trading years (1008 trading days) 
which is moved in steps of two trading months (42 trading days). All together we 


obtain 100 estimates for /(c, t). We then calculate the potential functions (22) which 


are presented in Fig. [8] (a)-(b). The dates mark the time points in the middle of the 
estimation time windows. In contrast to the diffusion function, the drift function turns 
out to be time-dependent. Therefore it is difficult to graphically present many curves 


in a single diagram, as the potential function (22) is defined up to an additive constant. 
To work around this problem we set 


V(co,t) = 0, 


(30) 


where Co denotes the value at which V (c, t) has its minimum in the first half of its values. 
In this representation the deeper a potential function is, the higher are the boundaries. 

Figure [8] (a) shows the results from early 1992 to the end of 2004. The distinct 


time periods described in Sec. 3.2 are clearly recognizable in the shape of the potential 
function. It is flat and approximately constant at the beginning. It gets deeper in the 
middle of the period during the turbulent time in 1997-98. The two local minima show 
instabilities on the market. By the end of the period the dot-com crises is reflected in 
the shape of V(c,t). Its boundaries get higher and it has many deep minima at high 
values of c(t). 







14 




Figure 8. (Color online) Time evolution of the potential function (221 from early 
1992 to the end of 2004 (a) and from 2002 to the end of 2012 (b) estimated on a time 
window of four trading years which is moved in steps of two trading months. The dates 
mark the time points in the middle of the estimation time windows. Representation is 
according to Eq. (301. 


Figure [8] (b) shows the results from early 2002 to the end of 2012. Similar to the 
previous case, V (c, t) is flat and constant during the relatively calm period in the early 
2000s. It changes its shape drastically in the second half of the 2007 and gets a deep 
local minimum around c(t) ~ 0.4. The boundaries get very high during the late-2000s 
financial crisis. We note that V (c, t) does not become flat after 2010. 

We showed that c(t) is described by a stochastic process (16) with a time- 
independent diffusion term and a time-dependent drift function. In Sec. [2] we showed 
that the mean correlation coefficient is the dominating variable of the collective market 
dynamics. The non-stationarity of the potential function is therefore explained by 
deterministic changes in the collective correlation structure on the market. 


5.3. Zooming into the Dot-com Bubble 

In the previous section we showed that the market evolves in time, switching back and 
forth between different market states. As an example of a state transition we estimate 
V(c,t ) in the period from early 1999 to early 2006. The interval covers the dot-com 
bubble. To achieve higher time resolution we perform the estimation on a time window 
of two trading years (512 trading days), sliding it in steps of one trading month (21 
trading days). Figure [9] shows the time evolution of the estimated potential function. 
It is flat at the beginning where the market is mainly in the states 1 and 2, see Fig. [5] 
During the crisis the values of c(t) increase. Therefore the estimated potential function 
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Figure 9. (Color online) Time evolution of the potential function (22) in the period 
from early 1999 to early 2006, which covers the dot-com bubble. The estimation is 
done on a time window of two trading years which is moved in steps of one trading 
month. The dates mark the time points in the middle of the estimation time windows. 
Representation is according to Eq. (30). 


moves along the c axis. A deep minimum builds up and the boundaries get higher. The 
market jumps through the states 3, 4 and 6, ending up in state 7 by the end of 2002. 
By the end of 2003 the market settles into state 6 with only short jumps into the states 
4 and 1. The potential function becomes constant but has changed its shape compared 
to the pre-crisis period. The market therefore jumps from a stable state to a turbulent 
state and then down to another stable state. 


5-4- Market States Dynamics: Stability, Hierarchy and State Transition 

In the previous sections we showed that the mean correlation coefficient is described by a 
stochastic process (16) with the time-independent diffusion function (25) and the time- 
dependent drift function. Especially, calm and turbulent periods can be distinguished 
by the shape of V(c,t). To quantify the market dynamics in a given market state we 
estimate the potential function for the data points (23). Thus, we only accounted for 
displacements within a fixed market state. 

The time series for the states 3, 4 and 5 are too short for the estimation of 0, 
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c(t) c(t) 


Figure 10. (a)-(c) The potential function (22) of the displacements within the states 

(23) (filled circles and triangles). The potential function of displacement within the 
three groups (241 is represented by the envelope line with crosses (+). (d) The overall 
potential landscape V (c) estimated on the entire time series c(t) at once. 


so we combined the time series of the states 2 and 3 together as well as 4 and 5. We 
denote the resulting states by 2 + 3 and 4 + 5 respectively. As shown in Fig. [6j these 
pairs consist of states of the same class. Figure [To] (a)-(c) shows the resulting potential 
functions for each market state. 

Potential functions provide information about the stability of market states. This 
notion of the stability is not due to the time which the market spends occupying a 
certain state, but is given by the dynamics of the market. States 1, 2+3, 6 and 8 are 
stable states, as their potential functions have a single deep minimum and therefore 
a clearly defined fixed point. State 8 mainly appears during the latest financial crisis 
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Step Distribution Increment Distribution 




Figure 11. (Color online) Empirical histograms of (a) the steps (31) and (b) the 
increments (32) within states (black, solid) and during state transitions (red, dashed). 


and represents a strong collective correlation on the market. In contrast state 7 is 
very unstable. Not only has its potential function two local minima, but it is also the 
deepest one. The correlation structure is non-stationary within the market state 7. The 
combined state 4+5 has a half-open potential function. States 4 and 5 are intermediate 
states between calm and turbulent periods, see Fig. |5j We note that within stable states 
c(t) is described by SDE (16) with the diffusion function (25) and a linear drift function. 

In Section 3T we grouped the market states into three classes according to the 
clustering tree, see Fig. [6j Not all of the market states appear simultaneously in a given 
time interval, as shown in Fig. |5j The Erst four years of the analyzed time period are 
dominated by the states 1 and 2, which belong to the first class. In the last four years 
basically only the states 7 and 8 appear, which build the third class. To quantify the 
hierarchical structure of the states we estimate V (c, t) for the points (24). We therefore 
account for displacements within the classes including state transitions. The resulting 
potential functions for the three classes are shown in Fig. 10 (a)-(c). These curves 


envelope the potential functions of the market states of the corresponding class. 

Similar to the envelopes we estimated V(c) on the entire time period at once, as 
shown in Fig. 10 (d). The potential function of each market state has a distinct position 
along the first principal component, i.e. a distinct value of c(t). We therefore conclude 
that, while the market is in a given (stable) state, the mean correlation coefficient 
fluctuates around a mean value, which is defined by the minimum of the potential 
function, see Figs. 10 and [6j As we showed in Sec. 2 .2[ the movement along the Erst 
principal component is given by the time evolution of c(t). Hence the market dynamics 
within a fixed state is given by the movement along the second and higher principal 
components, see Figs. [3] and |4} Large changes of the mean correlation coefficient yield 
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state transitions. The market is therefore ” hopping” from state to state in the potential 


landscape, which is shown in Fig. 10 (d). For consistency we calculate the daily steps 


S(t) =|| c(t + 1) — c(t) 
of the market and the absolute increments 

A (t) =| c(t + 1) — c(t ) 


(31) 


(32) 


of c(t). Figure 11 (a)-(b) shows the distribution of the steps (31) and the increments 


(32) within market states compared to the jumps during a state transition. Both the 


steps and especially the increments are on average larger during state transitions that 
within states as we claimed. 


6. Conclusion 

The combination of geometric data analysis and stochastic methods sheds new light on 
the collective dynamics of complex systems. We applied these techniques to stock market 
data and evaluated the correlation structure on a sliding time window for a period of 
21 years. The collective market dynamics in terms of the principal components is given 
by the average correlation coefficient. We extracted the underlying stochastic process 
which turns out to have a time-independent stochastic term and a time-dependent 
deterministic term. The latter is represented graphically as a potential landscape and 
provides information on stability and system fixed points. We established the connection 
between distinct historical periods on the market and the time evolution of the potential 
function. The non-stationary market dynamics can be attributed to changes in the 
deterministic part of the collective market dynamcis. We identified quasi-stationary 
states of the market following Ref. [ID] and distinguished three main classes of market 
dynamcis: Calm, intermediate and turbulent states. To quantify the market states 
dynamics we estimated the potential functions, accounting only for displacements within 
a fixed state. In a given state the average correlation fluctuates around a distinct mean 
value, which defines a fixed point. The market dynamics within a market state is given 
by the movement along higher principal components. State transitions are reflected in 
large changes of the average correlation and correspond to the hopping in the potential 
landscape. Our results are consistent with the random matrix approach of Ref. [38] and 
contribute to a better overall understanding of market dynamics. While we highlighted 
the application to financial data in this paper, our approach should prove useful for the 
study of any quasi-stationary complex system. 
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